InternVL3-78B-Instruct is an advanced multimodal large language model developed by OpenGVLab, demonstrating exceptional multimodal perception and reasoning capabilities, supporting various tasks such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Image-to-Text
Transformers Other